Library, 

APR  36 


^ecltnlcai     v2ote 


Reference  I 

taken  from  the        y. 
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ABSTRACT 

The  report  describes  a  file  structure 
which  combines  list-processing  concepts 
(for  handling  variable  length  information 
records)  with  standard  serial  record 
arrangements  (for  identification  informa- 
tion).     The  file  organization  was  designed 
for  a  large  chemical  information  system 
and  includes  both  well- structured  and 
unstructured  (amorphous)  information. 
An  investigation  was  made  of  representa- 
tive data  inputs  from  the  Department  of 
the  Army.      The  data  to  be  put  into  the 
file,    the  nature  of  the  file  structure,    and 
the  necessary  programs  for  manipulation 
of  file  information  have  been  considered 
as  interdependent  parts  of  a  total  system. 
Computer  programs  have  been  initiated  to 
test  the  validity  of  the  proposed  approach. 


Key  Words: 

file  organization,    chemical  information,    chemical  struc- 
tures,   linear  notations,    list  processing,    threaded  lists, 
heirarchical  files,    master  files,    satellite  files,   structured 
files,    level  codes,    pointers. 


1.     INTRODUCTION 

This  report  describes  a  new  approach  to  the  structuring  of  a 
large  file  containing  diverse  information.      The  Army  Research  Office 
sponsored  the  research  here  described  in  support  of  its  requirements 
for  a  large  integrated  chemical  information  system. 


The  Department  of  the  Army  has  over  five  hundred  installations 
which  handle  technical  information;  approximately  one  third  of  them 
handle  information  related  in  some  way  to  chemistry.     Many  of  the 
laboratories  handle  chemical  structures,    but  there  is  a  diversity  in  the 
other  kinds  of  information  associated  with  the  structures;     that  varies 
according  to  the  mission  and  requirements  of  the  particular  laboratory. 
The  project  here  described  has  been  concerned  with  an  attempt  to  devise 
a  file  structure  which  will  permit  the  storage  and  retrieval  of  all  catego- 
ries of  information  contained  in  any  of  the  Army's  laboratories  (not  all 
of  which  exist  in  any  one  laboratory)  in  association  with  the  structure  of 
a  chemical  compound. 

Initial  investigations  were  concerned  with  exploration  of  file  orga- 
nization research  undertaken  by  others  in  an  effort  to  determine  whether 
techniques  developed  for  different  situations  could  be  applied  to  the 
organization  and  manipulation  of  chemical  information.  Sample  s  of  CIDS 
"hard-core"  data  were  obtained  from  various  laboratories  and  a  proce- 
dure for  organizing  files  of  this  information  was  devised.     This  proce- 
dure combines  the  features  of  a  fixed- length  sequential  master  record 
file  with  those  of  a  variable-length  non- sequential  data  file  strung  to- 
gether in  a  list-processing  fashion.      Present  work  includes  programs 
for  file  creation,    updating  and  expansion.     Future  plans  include  the 
creation  of  sub- files  and  the  querying  of  the  files. 


2.     OBJECTIVES 

The  project  has  been  concerned  with  two  considerations;  long- 
range  objectives  and  short-term  goals. 

The  long-range  objectives  are: 

1.       To  define  some  of  the  characteristics  of  large  files,    and  to 
develop  the  structure  for  a  large  file  of  heterogeneous 
scientific  and  technical  information  including  chemical  struc- 
ture representations,    with  particular  attention  to  the 
necessity  for: 

a.  Manipulating  information  which  is  in  some  cases  for- 
matted and  in  others  completely  amorphous. 

b.  Making  provision  for  inclusion  of  certain  kinds  of  infor- 
mation when  it  exists,  and  for  later  filling  of  gaps  when 
the  information  is  not  available  at  the  time  of  file  initiation,. 


c.  Creating  a  multi-level  file  of  information  so  that  provi- 
sion may  be  made  for  the  inclusion  of  generic  information, 
as  well  as  the  addition  of  various  levels  of  specificity  as 
required  by  the  user,    either  simply  because  the  additional 
levels  of  specificity  exist  and  might  be  useful  at  a  later 
date  or  because  there  is  a  user  requirement  for  that 
degree  of  specificity. 

d.  Examining     the  inter- structuring  of  files  that  are  part  of 
the  larger  files  but  which  may  be  geographically  separated. 

To  provide  list- processing  capability  in  the  file  structure  in 
order  to: 

a.  Maintain  flexibility  in  the  file. 

b.  Provide  for  an  efficient  means  of  updating. 

c.  Permit  additions  to  files,    both  in  classes  existing  already 
within  the  system  and  in  the  entry  of  new  classes  of 
information. 

d.  Free  the  system  from  the  constraints  of  fixed-length  and 
formatted  files. 

e.  Permit  aggregations  of  data  from  files  that  are  geograph- 
ically separated. 

To  investigate  the  techniques  of  file  manipulation  in  order  to 
provide  systems  of  sub-files,    special  files,    desirable  redun- 
dancy in  exchange  for  multiple  access  to  information,    and  the 
necessary  keying  or  cross-referencing  facility  required  for 
such  a  system  of  multi-level,   multi-  subject  files.      This  work 
must  take  into  account  also  the  planning  and  development  of 
the  several  kinds  of  computer  programs  which  are  required 
for  file  maintenance.      (This  report  presupposes  that  all  such 
file  manipulation  will  be  carried  out  by  computers.  ) 

To  determine  the  kind  of  organization  of  information  which 
will  most  readily  permit  questioning  of  the  file  by  different 
groups  of  questioners  who  have  varied  (and  varying)  require- 
ments for  the  kinds  of  information  contained  in  the  file,    and 
who  have,    furthermore,    requirements  for  differing  degrees 
of  specificity  in  the  information  they  are  seeking. 


The  short-term  goals  are  concerned  with  attempting  to  satisfy  the 
following  immediate  requirements: 

1.       To  store  specific  categories  of  well-defined  data  ina  computer, 

2„       To  devise  procedures  for  questioning  the  file  and  retrieving 
information  from  it. 

3.       To  employ  computer  search  programs  on  a  limited  basis  to 
test  the  validity  of  the  approaches. 


3.     BACKGROUND  STUDIES 

Theoretical  studies  and  actual  implementations  of  files  developed 
by  others  were  investigated  with  a  view  toward  incorporating  those 
approaches  deemed  applicable  to  large  chemical  information  systems. 
All  of  the  file  organizations  studied  were  designed  for  structuring  infor- 
mation for  automatic  storage  and  retrieval,    but  were  responsive  to 
different  requirements.      The  studies  investigated  included  variations  of 
list-processing  techniques,    hospital  and  clinical  record  keeping,    and 
chemical  structure  files  containing  associated  corollary  information. 
Appendix  A  lists  the  general  background  material  for  the  development 
work  here  described;  our  work  was  a  departure  from  any  of  these  partic- 
ular systems  because  of  the  specific  nature  of  the  Army's  requirements. 

The  classical  concept  of  a  master  file  containing  all  known  infor- 
mation for  each  entry  in  full  detail  was  considered,    but  rejected  because 
of  its  requirement  for  large  amounts  of  space  and  because  of  its  re- 
straint on  expansion  in  as  yet  unknown  directions.     Similarly,    an  investi- 
gation was  made  of  the  idea  of  a  "Control  File"  or  "File  Relater  and 
Locator  System"  which  would  consist  of  a  separate  file  of  identification 
numbers  of  entries  and  the  names  of  the  satellite  files  wherein  informa- 
tion on  these  entries  is  located,    together  with  the  keys  for  addressing 
them.     After  consideration  of  the  types  of  data  to  be  included  in  a  large 
chemical  information  file  (see  Section  4),    it  was  decided  as  a  first 
approach  to  devise  a  system  incorporating  the  features  of  the  two  plans 
discussed  above,    L  e.  ,    a  linear  file  consisting  of  basic  fixed-field  infor- 
mation about  the  entry,    plus  pointers  to  satellite  information  files. 


4.     TEST  DATA 

The  U.  S.    Army  (see  Appendix  A-l)  lists  the  following  eight  types 
of  information  to  be  included  for  each  compound  in  their  chemical  infor- 
mation and  data  system: 

1.  Registration  or  identification  number 

2.  Chemical  structure,    probably  a  listing  of  atoms  and  bonds 

3.  Molecular  formula 

4.  Bibliographic  citations 

5.  Nomenclature,  including  chemical  names,    linear  notation, 
trade  names,    etc. 

6.  Location  of  data  files  information 

7.  Kinds  of  data  and  information  available  at  each  location 

8.  Security  classification  and  releasability 

One  other  item  not  listed  is  the  source  of  the  compound  or  materi- 
al    that  is  the  subject  of  the  file  entry. 

The  Army's  file  is  designed  to  accomodate  the  entry  of  three 
million  compounds.     It  should  be  noted  that  the  above  list  includes  both 
structured  and  unstructured  information  and  that  the  list  includes  refer- 
rals to  satellite  files. 

The  Army  was  asked  to  supply  sample  data  for  the  experimental 
attempts  to  structure  the  files,    and  in  answer  six  organizations  supplied 
sample  forms  which  had  been  executed  in  accordance  with  a  set  of 
instructions  supplied  by  the  Army  (see  Appendix  B).    Some  of  the  test 
data  suppliers  were  Army  laboratories  and  others  were  organizations 
who  had  contracts  with  the  Army.      The  major  portion  of  the  information 
was  recorded  on  the  CIDS  Registry  File  -  Hardcore  Input  Forms  (SMUEA 
Form  13,    26  AUG  64).     (See  Appendix  C.  ) 

As  might  have  been  expected,    there  was  considerable  variation 
not  only  in  the  content  of  the  information  supplied  on  the  sample  forms, 
but  in  the  completeness  of  the  individual  entries.      Certain  of  the  organi- 
zations employed  forms  of  their  own  design  rather  than  those  shown  in 
Appendix  C.      There  were  differences  in  interpretation  among  the  various 


organizations  as  to  what  was  desired  by  the  Army  for  certain  categories 
of  information.     All  of  the  forms  were  incomplete  in  some  respect,    with 
some  having  little  more  than  a  structure  diagram  and  the  name  of  the 
reporting  agency.      Based  on  the  use  of  the  trial  forms,    and  as  noted  in 
the    Conclusions    (see  Section  7),     additional  effort    should  be  ex- 
pended on  the  design  of  forms  and  the  development  of  instructions  for 
completing  them  in  order  to  promote  greater  consistency. 

Personnel  at  Frankford  Arsenal  compiled  a  list  of  1008  different 
categories  of  information  in  an  attempt  to  identify  all  of  the  information 
requirements  of  the  Department  of  the  Army.     This  list  will  remain 
open-ended  for  the  subsequent  addition  of  new  categories  as  need  arises. 
Anyone  installation  would  not  contain  all  of  the  elements  of  information 
which  appear  in  the  complete  list.     An  analysis  was  made  of  the  informa- 
tion received  from  the  six  sources  listed  in  Appendix  B,    and  it  was  found 
that  the  information  supplied  on  the  forms  contained  only  a  small  number 
of  the  categories  from  the  list  compiled  by  Frankford  Arsenal.     No 
attempt  was  made  to  relate  them  directly  to  the  Frankford  list,  but  it  was 
observed  that  the  elements  of  data  encountered  on  the  forms  tended  to  be 
formed  into  different  classes  rather  than  to  comprise  sub- sets  of  the 
Frankford  listing.     It  can  be  assumed  that  the  data  reported  responded  to 
the  requirements  of  the  individual  reporting  agency. 

For  the  sake  of  the  experiment,    it  was  desirable  to  treat  all  of  the 
data  from  the  six  reporting  agencies  in  a  uniform  way.      Therefore,    the 
following  composite  list  of  all  the  data  reported  was  formed;  no  one  set  of 
forms  contained  all  of  the  categories,    and  some  contained  only  a  few. 

1.  Reporting  agency 

2.  Security  classification 

3.  Release  restrictions 

4.  Local  control  number 

5.  Date 

6.  Registry  number 

7.  Molecular  formula 

8.  Nomenclature 

9.  Structure 


10.  Bibliographic  references 

1 1 .  Types  of  data 

12.  Key  words 

Notations  (such  as  Wiswesser  notation,    Hayward  notation,    IUPAC 
cipher,    etc.  )  were  included  under  nomenclature.      These  should  form  a 
separate  category  since  nomenclature  also  includes  trivial  names,    trade 
names,    chemical  names,  common  names,    and  others.     (See    Con- 
clusions,   Section  7).  With  respect  to  certain  other  categories  of  infor- 
mation,   some  forms  merely  listed  the  presence  of  information  in  that 
particular  category,   without  supplying  the  actual  information  itself. 


5.     PROPOSED  FILE  ORGANIZATION 

A  tentative  file  organization  to  handle  the  categories  of  information 
contained  on  these  forms  is  now  under  development.     It  will  serve  as  an 
experimental  model  for  the  purpose  of  ascertaining  from  the  proposed 
users  its  expected  utility  for  their  requirements. 

The  overall  file  system  will  consist  of  two  parts: 

1.  A  master-file  of  fixed  length  information, 

2.  Information- files  of  variable  length  information. 

Further,    the  master  file  will  contain  keys  to  the  location,    size  and  type 
of  pertinent  entries  in  the  information  files. 

Each  record  in  the  master-file  will  be  assigned  a  unique  identifi- 
cation (ID)  number  and  each  item  in  the  information  file  will  be  tie,d  to 
its  master-file  record  by  the  ID  number  and  will  be  identified  by  a 
"level  code1,  to  indicate  its  hierarchy  in  a  two-dimensional  chained-file 
arrangement.      The  following  list  of  such  categories  was  used  as  an   ex- 
perimental model,    but  is  by  no  means  an  all-inclusive  one.     (Note 
assigned  "level  code"  at  right.  } 


Table  1 


LIST  OF  FILE  CONTENTS  BY  CATEGORY 


Cate 

gory 

1. 

ID  number 

2. 

Reporting  laboratory 

3. 

Local  control 

4. 

Security 

5. 

Date 

6. 

Molecular  formula 

7. 

Notations 

a.         Hayward 

b.         Wiswesser 

Level  Code 


010000 
020000 

021000 
022000 


8.  Nomenclature 

9.  Types  of  data 

a.  Physical  properties 

b.  Chemical  properties 

c.  Physiological  effects 


030000 
040000 

041000 
042000 
043000 


1.  Respiratory 

2.  Cardiac 

3.  Neuromuscular 


043100 
043200 
043300 


Toxicity 


044000 


1.  Intravenous 

2.  Intramuscular 


044100 
044200 


a. 
b. 


Rabbits 
Rats 


044210 
044220 


3.    Oral 


044300 


10.        Provision  for  supplementing  categories  in  the  above  listing 


Table  2  is  a  machine  word  schematic  for  the  fixed-length  master- 
file  record. 


Table  2 


SCHEMATIC  FOR  FIXED-LENGTH  MASTER-FILE  RECORD 


Word  no, 

1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 
10. 


ID  number 

Reporting  laboratory 

Local  control  number 

Security  data 

Date  J 

Pointer  to  molecular  formula  record 

Pointer  to  notations  record 

Pointer  to  nomenclature  record 

Pointer  to  types  of  data  record 

Unassigned 


identification 
data 


> 


pointers 
to 

information 
files 


The  fixed-length  master-file  record  will  occupy  1  0  machine  words. 
Words  1  through  5  consist  of  identification  information.     Words  6 
through  10  are  "pointer"  words.      They  serve  as  indicators  of  the  pres- 
ence or  absence  in  the  information  file  of  their  respective  categories  of 
information  with  a  key  to  location  and  length  of  record  when  presence  is 
indicated. 

A  typical  pointer-word  consists  of  three  fields:     one  field  contains 
the  level  code  for  a  particular  category  of  information;  the  second  field 
contains  the  address  where  that  particular  block  of  information  can  be 
found;     the  third  field  reveals  the  automatically-generated  number  of 
machine  words  of  data  contained  in  that  information  record.      Certain 
addresses  in  this  field  of  a  pointer  word  imply  special  conditions  about 
the  referred  to  data.      This  will  be  illustrated  with  specific  information 
shortly. 


Pointer  Word 


level  code 

address 

number  of  words 

Table  3  is  a  machine  word  schematic  for  a  variable-Length  infor- 
mation-file record. 


Word  no. 


Table  3 

SCHEMATIC  FOR  VARIABLE -LENGTH 
INFORMATION -FILE  RECORD 


1. 
2. 

3. 

4. 

5. 

6. 

i 

i 
i 

N 


ID  number 

Previous  item 

Current  item 

Next  item  in  -*    direction 

(sub-class  of  current  item) 
Next  item  in  ^   direction 

(parallel  class  to  current  item) 


>  pointers 


information  available 
in  the  category  of 
the  current  item 


The  variable-length  information- file  record  contains  five  fixed 
words  at  the  beginning  of  each  record:    the  first  word  is  the  ID  number 
(which  is  shown  as  word  1  in  the  master-file  record  in  Table  2);  the  next 
four  words  are  pointers;  word  #2  points  backward  to  the  previous  item 
in  the  hierarchy  of  levels,   word  #3  identifies  the  current  item  and  words 
#4  and  #5  point  forward  allowing  a  branching  in  two  directions.     The 
actual  locations  of  the  blocks  of  data  are  irrelevant,    as  long  as  the 
pointers  link  them  correctly.     The  basic  filing  algorithm  is  the 
following:     each  information  record  filed  is  placed  in  a  block  whose 
address  is  in  an  index  register  called  OPEN.      This  address  becomes  the 
address  of  the  current  item  being  filed  and  the  index  register  is 
adjusted  so  it  contains  the  address  of  the  next  OPEN  block. 

Let  us  consider  a  master-file  record  typical  of  those  submitted, 
where  information  is  given  under  each  category  listed  in  the  data  outline 
(Table  1)  except  for  nomenclature,    level  code  030000. 
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Table  4 


TYPICAL  MASTER-FILE  RECORD 


Word  no. 

Content 

s 

1. 

A0000007 

2. 

001500 

3. 

0006000 

4. 

000000 

5. 

120764 

6. 

010000 

MOLE 

07 

7. 

020000 

HAYW 

03 

8. 

000000 

0000 

00 

9. 

040000 

DATA 

09 

10. 

000000 

0000 

00 

Explanation 


> 


O 

»— * 

H 
W 


ID  number 

Reporting  laboratory  code 
Local  control  number 
Security  information 
Date 

Molecular  formula 
Notations 
Nomenclature 
Types  of  data 
Provision  for  additional 
categories 


Words  6,    7,    and  9  indicate  that  information  pertaining  to  their 
respective  level- codes  can  be  found  at  the  named  machine  locations. 
For  example,  word  #6  indicates  that  seven  words  of  information  for 
level- code  010000  (molecular  formula)  can  be  found  at  location  MOLE. 
Word  #8  is  zero,    indication  that  there  is  no  information  in  the  file  for 
le  vel-code  030000  (nomenclature  ).  Word  #10  is  blank  indicating  that  no  addi 
tional  categories  of  information  have  been  added  to  the  master-file 
record. 

Note  how  the  above  master-file  arrangement  leads  itself  to 
cursory  scanning  of  "pointers"  for  the  purpose  of  creating  sub-files  for 
each  major  category.     Further  illustrations  will  demonstrate  that  this 
facility  for  creating  sub-files  can  be  carried  out  quickly  and  efficiently 
at  any  given  level  in  the  file. 


Let  us  now  look  at  location  MOLE  for  a  typical  Information  File 


item. 
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Table  5 


TYPICAL  INFORMATION  FILE  RECORD  -  EXAMPLE  1 


Location 


Contents 


Explanation 


MOLE 
MOLE  +  1 

MOLE  +  2  010000 

MOLE  +  3  000000 

MOLE  +  4  000000 

MOLE  +  5  thru  MOLE  +  11 


A0000007 
MASTER 


MOLE 

0000 

0000 


07  r 

00 
00 


O 

1—1 

H 
W 

on 


ID  number 
Pointer  to 

previous  item 
Current  item 
Next  item  -*. 
Next  item   ^ 


The  molecular  formula  information 


Note  from  Table  1  that  molecular  formula  is  a  one  entry  level. 
Therefore  words  MOLE+3  and  MOLE+4  are  pointer  words  of  ZERO 
indicating  the  end  of  the  chain.     MASTER  in  MOLE+1  is  a  special  symbol 
to  point  back  to  the  master  file.     Since  the  master  file  will  be  sorted  and 
rearranged,  it  was  decided  not  to  use  the  original  address  in  pointers 
back  to  the  master  file.     Note  that  words  at  MOLE  and  MOLE+2  are 
identical  with  words  1  and  6  respectively  in  Table  4. 

Now  let  us  consider  the  representation  of  an  information  record 
in  a  more  complex  network  such  as  intramuscular  toxicity.    (See  Table  1, 
category  9.  d.  2,    Level  Code  044200.  ) 


Table  6 


TYPICAL  INFORMATION  FILE  RECORD  -  EXAMPLE  2 


Location 

INMUS 

INMUS+1 

INMUS+2 

INMUS+3 

INMUS+4 


044100 
044200 
044210 
044300 


Contents 


A0000007 


INVEN 
INMUS 
RABB 
ORAL 


09 
00 
05 
11 


o 

I— I 

H 


Explanation 

ID  number 
Previous  Item 
Current  Item 
Next  Item  -» 
Next  Item   J. 


The  zeros  in  the  third  field  of  INMUS+2  indicate     that  the  current 
item  is  a  stepping  stone  to  items  further  on  in  the  hierarchy.      Here 
there  is  no  general  information  on  intramuscular  toxicity,    but  there  is 
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information  on  intramuscular  toxicity  in  rabbits  located  at  RABB. 
Further  there  is  information  on  oral  toxicity  located  at  ORAL. 

Note  that  the  level  code  of  the  next  item  -»   and  the  level  code  of 
the  next  item   I   (here  words  INMUS+3  and  INMUS+4)  both  can  be  converted 
to  the  level  code  of  the  current  item  when  the  following  rule  is  applied: 
Starting  from  the  right,    decrease  the  first  non-zero  digit  in  the  level 
code  by  one.      This  will  prove  useful  in  threading  backwards  in  the 
hierarchy ! 


6.      CURRENT  STATUS 

Information  has  been  punched  on  8 -channel  teletype  tapes  in 
ASCII  (American  Standard  Code  for  Information   Interchange)  from  the 
data  mentioned  earlier  in  this  report.,     A  program  has  been  written  for 
creating  the  master-file  records  from  these  data  tapes.     In  addition 
several  programs  already  operational  are  available  from  other  projects 
for  manipulating  files.      These  programs,    written  for  the  NBS  Chemical 
Structure  Manipulation  Projects,    are  as  follows: 

1.  A  system  for  a  chemical  structure  or  substructure  search 
based  on  the  Hayward  notation. 

2.  A  sort  routine  for  sorting  blocks  of  information  on  a  specific 
key  in  the  block. 

3.  A  program  for  the  derivation  of  molecular  formulae  and 
other  structure- related  screening  information  from  the 
Hayward  notation. 

Programs  are  being  written  for  transforming  both  Hayward  and 
Wiswesser  notations  into  connection  tables  of  the  same  format.  Generic 
structures  of  the  Markush  type  and  partially  indeterminate  structures 
will  be  interfiled  with  such  connection  tables.     A  program  is  also  being 
written  for  a  chemical  structure  search  from  the  connection  tables  of 
both  specific  and  Markush  structures. 

An  executive  routine  is  planned  for  the  system  which  will  include 
modules  for  file  maintenance,    creation  of  subfiles,    and  the  exercise  of 
certain  control  functions. 
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7.      CONCLUSIONS 

Based  upon  the  initial  research  efforts   reported  here,    the  following 
activities   appear  to  provide  fruitful  areas  for  continuation  of  the 
research  in  the  organization  of  large  chemical  files. 

1.  Continue  efforts  to  find  -ways  of  representing  unstructured  informa- 
tion.     The  development  of  a  systematic  classification  scheme  may 
be  necessary. 

2.  Make  provision  for  indicating  in  the  automated  information  file  that 
additional  information  is  available  elsewhere,    and  indicate  where  it 
is  located;  e.g.  ,    microfilm  files,    hard-copy  folders  at  the  report- 
ing laboratory,    or  other. 

3.  Design  forms  for  reporting  information,    along  with  a  comprehensive 
set  of  instructions  for  proper  recording  of  the  information  on  the 
forms.     Uniform  reporting  on  standard  forms  will  tend  to  promote 
greater  consistency  in  the  input  data.      This  is  an  especially  impor- 
tant consideration  if  a  central  file  is  to  be  built  which  will  serve 
several  agencies. 

4.  Amend  the  CIDS  list  of  hard-core  items  to  include  notation  as  a 
separate  item  rather  than  as  a  part  of  nomenclature. 

5.  Continue  the  development  of  programs  for  entering  information  into 
the  files  and  for  searching  the  files. 

6.  Develop  a  package  of  software  containing  service  routines  and  exec- 
utive routines  which  will  exercise  control  over  file  maintenance  and 
selection  of  appropriate  search  routines  for  the  file. 
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Appendix  B 

TEST  DATA  SUPPLIERS 

Name  No.    of  Forms 

1.  U.S.    Army  Biological  Laboratories  49 
Fort  Detrick 

(Disinfectant  files) 

2.  U.  S.   Army  Biological  Laboratories  222 
Fort  Detrick 

(Crop  files) 

3.  Edgewood  Arsenal  50 

4.  Aberdeen  Proving  Grounds  29 

5.  Chemical  Abstracts  Service  100 


; 


University  of  Pennsylvania  105 
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Appendix  -  C-l 


CIDS  REGISTRY  FILE  -  HARDCORE  INPUT 


1.  REPORTED  BY 


5Q.  LOCAL  CTL.  NO. 


7.  MOLECULAR  FORMULA 


Sb.  CTL.  CHANCE 


5C. DATE 
MO.    OY.    YR. 

It  17   bf- 


6.RE6ISTRY  NO. 


CtF„A/ 


2.  COMPOUND  SEC.CLASSIF. 


3.  GROUP 


RELEASE  RESTRICTIONS 


8.  NOMENCLATURE 


9.  STRUCTURAL  FORMULA 


"CF3y  JX 


SMUEA  FORM  13  (TEST) 
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Appendix  -  C-2 


LOCAL  CTL. MO. 


CIDS  REGISTRY  FILE  -  HARDCORE  INPUT  CONTINUATION  SHEET 


10.  KINDS  OF  OATA 


Boiling  point 
Index  of  refraction 
preparation 


BIOGRAPHICAL  CITATIONS 


**•  0*77Z'us.  IMA.  9Z7  »/*/** 


SHUEA  FORM  13  (TEST)  17  PAGE — OF  _ 


U.S.  DEPARTMENT  OF  COMMERCE  postage  and  fees  pa,d 

WASHINGTON,  D.C.     20230  u.s.  department  of  commew 


OFFICIAL  BUSINESS 


